24 research outputs found
Quantifying the knowledge in Deep Neural Networks: an overview
Deep Neural Networks (DNNs) have proven to be extremely effective at learning a wide range of tasks. Due
to their complexity and frequently inexplicable internal state, DNNs are difficult to analyze: their black-box nature
makes it challenging for humans to comprehend their internal behavior. Several attempts to interpret their operation
have been made during the last decade, but analyzing deep neural models from the perspective of the knowledge
encoded in their layers is a very promising research direction, which has barely been touched upon. Such a research
approach could provide a more accurate insight into a DNN model, its internal state, learning progress, and knowledge
storage capabilities. The purpose of this survey is two-fold: a) to review the concept of DNN knowledge quantification
and highlight it as an important near-future challenge, as well as b) to provide a brief account of the scant existing
methods attempting to actually quantify DNN knowledge. Although a few such algorithms have been proposed, this
is an emerging topic still under investigation
Neural Natural Language Processing for Long Texts: A Survey of the State-of-the-Art
The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural
Language Processing (NLP) during the past decade. However, the demands of long
document analysis are quite different from those of shorter texts, while the
ever increasing size of documents uploaded on-line renders automated
understanding of long texts a critical area of research. This article has two
goals: a) it overviews the relevant neural building blocks, thus serving as a
short tutorial, and b) it surveys the state-of-the-art in long document NLP,
mainly focusing on two central tasks: document classification and document
summarization. Sentiment analysis for long texts is also covered, since it is
typically treated as a particular case of document classification.
Additionally, this article discusses the main challenges, issues and current
solutions related to long document NLP. Finally, the relevant, publicly
available, annotated datasets are presented, in order to facilitate further
research.Comment: 53 pages, 2 figures, 171 citation
Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics
Video summarization is a timely and rapidly developing research field with broad commercial interest, due to the increasing availability of massive video data. Relevant algorithms face the challenge of needing to achieve a careful balance between summary compactness, enjoyability, and content coverage. The specific case of stereoscopic 3D theatrical films has become more important over the past years, but not received corresponding research attention. In this paper, a multi-stage, multimodal summarization process for such stereoscopic movies is proposed, that is able to extract a short, representative video skim conforming to narrative characteristics from a 3D film. At the initial stage, a novel, low-level video frame description method is introduced (frame moments descriptor) that compactly captures informative image statistics from luminance, color, optical flow, and stereoscopic disparity video data, both in a global and in a local scale. Thus, scene texture, illumination, motion, and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., for intra-shot frame clustering. The computed key-frames are then used to construct a movie summary in the form of a video skim, which is post-processed in a manner that also considers the audio modality. The next stage of the proposed summarization pipeline essentially performs shot pruning, controlled by a user-provided shot retention parameter, that removes segments from the skim based on the narrative prominence of movie characters in both the visual and the audio modalities. This novel process (multimodal shot pruning) is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach. Subsequently, disorienting editing effects induced by summarization are dealt with, through manipulation of the video skim. At the last step, the skim is suitably post-processed in order to reduce stereoscopic video defects that may cause visual fatigue
GreekPolitics: Sentiment Analysis on Greek Politically Charged Tweets
The rapid growth of on-line social media platforms has rendered opinion mining/sentiment analysis a critical area of research. This paper focuses on analyzing Twitter posts (tweets), written in the Greek language and politically charged in content. This is a rather underexplored topic, due to the inadequacy of publicly available annotated datasets. Thus, we present and release GreekPolitics: a dataset of Greek tweets with politically charged content, annotated for four different sentiments: polarity, figurativeness, aggressiveness and bias. GreekPolitics has been evaluated comprehensively using state-of-the-art Deep Neural Networks (DNNs) and data augmentation methods. This paper details the dataset, the evaluation process and the experimental results
Stereoscopic Medical Data Video Quality Issues
Stereoscopic medical videos are recorded, e.g., in stereo endoscopy or during video recording medical/dental operations. This paper examines quality issues in the recorded stereoscopic medical videos, as insufficient quality may induce visual fatigue to doctors. No attention has been paid to stereo quality and ensuing fatigue issues in the scientific literature so far. Two of the most commonly encountered quality issues in stereoscopic data, namely stereoscopic window violations and bent windows, were searched for in stereo endoscopic medical videos. Furthermore, an additional stereo quality issue encountered in dental operation videos, namely excessive disparity, was detected and fixed. The conducted experiments prove the existence of such quality issues in stereoscopic medical data and highlight the need for their detection and correction
Deep Reinforcement Learning with semi-expert distillation for autonomous UAV cinematography
Unmanned Aerial Vehicles (UAVs, or drones) have revolutionized modern media production. Being rapidly deployable “flying cameras”, they can easily capture aesthetically pleasing aerial footage of static or moving filming targets/subjects. Current approaches rely either on manual UAV/gimbal control by human experts or on a combination of complex computer vision algorithms and hardware configurations for automating the flight+flying process. This paper explores an efficient Deep Reinforcement Learning (DRL) alternative, which implicitly merges the target detection and path planning steps into a single algorithm. To achieve this, a baseline DRL approach is augmented with a novel policy distillation component, which transfers knowledge from a suitable, semi-expert Model Predictive Control (MPC) controller into the DRL agent. Thus, the latter is able to autonomously execute a specific UAV cinematography task with purely visual input. Unlike the MPC controller, the proposed DRL agent does not need to know the 3D world position of the filming target during inference. Experiments conducted in a photorealistic simulator showcase superior performance and training speed compared to the baseline agent while surpassing the MPC controller in terms of visual occlusion avoidance